9 research outputs found

    Density estimation on an unknown submanifold

    Get PDF
    We investigate density estimation from a nn-sample in the Euclidean space RD\mathbb R^D, when the data is supported by an unknown submanifold MM of possibly unknown dimension d<Dd < D under a reach condition. We study nonparametric kernel methods for pointwise and integrated loss, with data-driven bandwidths that incorporate some learning of the geometry via a local dimension estimator. When ff has H\"older smoothness β\beta and MM has regularity α\alpha in a sense to be defined, our estimator achieves the rate n−α∧β/(2α∧β+d)n^{-\alpha \wedge \beta/(2\alpha \wedge \beta+d)} and does not depend on the ambient dimension DD and is asymptotically minimax for α≥β\alpha \geq \beta. Following Lepski's principle, a bandwidth selection rule is shown to achieve smoothness adaptation. We also investigate the case α≤β\alpha \leq \beta: by estimating in some sense the underlying geometry of MM, we establish in dimension d=1d=1 that the minimax rate is n−β/(2β+1)n^{-\beta/(2\beta+1)} proving in particular that it does not depend on the regularity of MM. Finally, a numerical implementation is conducted on some case studies in order to confirm the practical feasibility of our estimators

    Theoretical Foundations of Ordinal Multidimensional Scaling, Including Internal and External Unfolding

    Full text link
    We provide a comprehensive theory of multiple variants of ordinal multidimensional scaling, including external and internal unfolding. We do so in the continuous model of Shepard (1966).Comment: same exact version with funding information adde

    Estimating the Reach of a Manifold via its Convexity Defect Function

    Get PDF
    The reach of a submanifold is a crucial regularity parameter for manifold learning and geometric inference from point clouds. This paper relates the reach of a submanifold to its convexity defect function. Using the stability properties of convexity defect functions, along with some new bounds and the recent submanifold estimator of Aamari and Levrard [Ann. Statist. 47 177-–204 (2019)], an estimator for the reach is given. A uniform expected loss bound over a C^k model is found. Lower bounds for the minimax rate for estimating the reach over these models are also provided. The estimator almost achieves these rates in the C^3 and C^4 cases, with a gap given by a logarithmic factor

    Inférence statistique sur des variétés inconnues

    No full text
    In high-dimensional statistics, the manifold hypothesis presumes that the data lie near low-dimensional structures, called manifolds. This assumption helps explain why machine learning algorithms work so well on high-dimensional data, and is satisfied for many real-life data sets.We present in this thesis some contributions regarding the estimation of two quantities in this framework: the density of the underlying distribution, and the reach of its support. For the problem of reach estimation, we suggest different strategies based on important geometric invariants — namely the convexity defect functions, and measures of metric distortions — from which we derive minimax-optimal rates of convergence. Regarding the problem of density estimation, we propose two approaches: one relying on the frequentist study of a kernel density estimator, and a Bayesian nonparametric approach based on location-scale mixtures of Gaussians. Both methods are shown to be optimal in most settings, and adaptive to the smoothness of the density. Lastly, we examine the behavior of some centrality measures in random geometric graph, the study of which, although unrelated to the manifold hypothesis, bears methodological and theoretical implications that can be of interest in any statistical framework.En statistique, l’hypothèse des variétés suppose que les données observées se répartissent autour de structures de faible dimension, appelées variétés. Ce postulat permet d’expliquer pourquoi les algorithmes d’apprentissage fonctionnent bien même sur des données en grande dimension, et est naturellement satisfait pour de nombreux jeux de données issus de la vie réelle. Nous présentons dans cette thèse quelques contributions aux problèmes d’estimation de deux quantités sous cette hypothèse : la densité de la distribution sous-jacente, et le reach de son support. Pour l’estimation du reach, nous élaborons des stratégies basées sur des invariants géométriques, avec d’une part la fonction de défaut de convexité, et d’autre part, des mesures de distortion métrique, desquels nous obtenons des vitesses de convergence optimales au sens minimax. Concernant l’estimation de la densité, nous proposons deux approches : l’une s’appuyant sur l’étude fréquentiste d’un estimateur à noyaux, et une approche bayésienne non-paramétrique se reposant sur des mélanges de gaussiennes. Nous montrons que ces deux méthodes sont optimales et adaptatives en la régularité de la densité. Enfin, nous examinons le comportement de certaines mesures de centralité dans des graphes aléatoires géométriques, l’étude duquel, bien que sans lien avec l’hypothèse des variétés, a des implications méthodologiques et théoriques qui peuvent être intéressantes dans tout cadre statistique

    Estimating a density near an unknown manifold: a Bayesian nonparametric approach

    Full text link
    We study the Bayesian density estimation of data living in the offset of an unknown submanifold of the Euclidean space. In this perspective, we introduce a new notion of anisotropic H\"older for the underlying density and obtain posterior rates that are minimax optimal and adaptive to the regularity of the density, to the intrinsic dimension of the manifold, and to the size of the offset, provided that the latter is not too small -- while still allowed to go to zero. Our Bayesian procedure, based on location-scale mixtures of Gaussians, appears to be convenient to implement and yields good practical results, even for quite singular data
    corecore